Beware the Null Hypothesis: Critical Value Tables for Evaluating Classifiers

نویسندگان

  • George Forman
  • Ira Cohen
چکیده

Scientists regularly decide the statistical significance of their findings by determining whether they can, with sufficient confidence, rule out the possibility that their findings could be attributed to random variation—the ‘null hypothesis.’ For this, they rely on tables with critical values pre-computed for the normal distribution, the t-distribution, etc. This paper provides such tables (and methods for generating them) for the performance metrics of binary classification: accuracy, F-measure, area under the ROC curve (AUC), and true positives in the top ten. Given a test set of a certain size, the tables provide the critical value for accepting or rejecting the null hypothesis that the score of the best classifier would be consistent with taking the best of a set of random classifiers. The tables are appropriate to consult when a researcher, practitioner or contest manager selects the best of many classifiers measured against a common test set. The risk of the null hypothesis is especially high when there is a shortage of positives or negatives in the testing set (irrespective of the training set size), as is the case for many medical and industrial classification tasks with highly skewed class distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing a Point Null Hypothesis against One-Sided for Non Regular and Exponential Families: The Reconcilability Condition to P-values and Posterior Probability

In this paper, the reconcilability between the P-value and the posterior probability in testing a point null hypothesis against the one-sided hypothesis is considered. Two essential families, non regular and exponential family of distributions, are studied. It was shown in a non regular family of distributions; in some cases, it is possible to find a prior distribution function under which P-va...

متن کامل

Comparison between Frequentist Test and Bayesian Test to Variance Normal in the Presence of Nuisance Parameter: One-sided and Two-sided Hypothesis

 This article is concerned with the comparison P-value and Bayesian measure for the variance of Normal distribution with mean as nuisance paramete. Firstly, the P-value of null hypothesis is compared with the posterior probability when we used a fixed prior distribution and the sample size increases. In second stage the P-value is compared with the lower bound of posterior probability when the ...

متن کامل

Moving Beyond Traditional Null Hypothesis Testing: Evaluating Expectations Directly

This mini-review illustrates that testing the traditional null hypothesis is not always the appropriate strategy. Half in jest, we discuss Aristotle's scientific investigations into the shape of the earth in the context of evaluating the traditional null hypothesis. We conclude that Aristotle was actually interested in evaluating informative hypotheses. In contemporary science the situation is ...

متن کامل

Default “Gunel and Dickey” Bayes factors for contingency tables

The analysis of R×C contingency tables usually features a test for independence between row and column counts. Throughout the social sciences, the adequacy of the independence hypothesis is generally evaluated by the outcome of a classical p-value null-hypothesis significance test. Unfortunately, however, the classical p-value comes with a number of well-documented drawbacks. Here we outline an...

متن کامل

Analysis of Two-way Contingency Tables

Pearson’s test and likelihood ratio test are two of the classical tests frequently used for testing the equality of r distributions using the data in the contingency tables. Presently we suggest an alternative test for testing the equality of distributions. It is found that the alternative test has, in general, more satisfactory levels of the acceptance probabilities under the null hypothesis, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005